Character-Based Embedding Models and Reranking Strategies for Understanding Natural Language Meal Descriptions
نویسندگان
چکیده
Character-based embedding models provide robustness for handling misspellings and typos in natural language. In this paper, we explore convolutional neural network based embedding models for handling out-of-vocabulary words in a meal description food ranking task. We demonstrate that character-based models combined with a standard word-based model improves the top-5 recall of USDA database food items from 26.3% to 30.3% on a test set of all USDA foods with typos simulated in 10% of the data. We also propose a new reranking strategy for predicting the top USDA food matches given a meal description, which significantly outperforms our prior method of n-best decoding with a finite state transducer, improving the top-5 recall on the all USDA foods task from 20.7% to 63.8%.
منابع مشابه
Hypotheses Selection Criteria in a Reranking Framework for Spoken Language Understanding
Reranking models have been successfully applied to many tasks of Natural Language Processing. However, there are two aspects of this approach that need a deeper investigation: (i) Assessment of hypotheses generated for reranking at classification phase: baseline models generate a list of hypotheses and these are used for reranking without any assessment; (ii) Detection of cases where reranking ...
متن کاملLinear Reranking Model for Chinese Pinyin-to-Character Conversion
Pinyin-to-character conversion is an important task for Chinese natural language processing tasks. Previous work mainly focused on n-gram language models and machine learning approaches, or with additional hand-crafted or automatic rule-based post-processing. There are two problems unable to solve for word n-gram language model: out-of-vocabulary word recognition and long-distance grammatical c...
متن کاملSurvey on Three Reranking Models for Discriminative Parsing
This survey is inspired by the so-called reranking techniques in natural language processing (NLP). The aim of this survey is to provide an overview of three main reranking tasks particularly for discriminative parsing. We will focus on the motivation for discriminative reranking, on the three models, boosting model, support vector machine (SVM) model and voted perceptron model, on the procedur...
متن کاملLow-Dimensional Discriminative Reranking
The accuracy of many natural language processing tasks can be improved by a reranking step, which involves selecting a single output from a list of candidate outputs generated by a baseline system. We propose a novel family of reranking algorithms based on learning separate low-dimensional embeddings of the task’s input and output spaces. This embedding is learned in such a way that prediction ...
متن کاملCharacter-level Intra Attention Network for Natural Language Inference
Natural language inference (NLI) is a central problem in language understanding. End-to-end artificial neural networks have reached state-of-the-art performance in NLI field recently. In this paper, we propose Characterlevel Intra Attention Network (CIAN) for the NLI task. In our model, we use the character-level convolutional network to replace the standard word embedding layer, and we use the...
متن کامل